Spatially Clustered Regression
Shonosuke Sugasawa∗1 and Daisuke Murakami†
∗Center for Spatial Information Science, The University of Tokyo
†Department of Data Science, The Institute of Statistical Mathematics
Abstract
Spatial regression or geographically weighted regression models have been widely
adopted to capture the eﬀects of auxiliary information on a response variable of inter-
est over a region. In contrast, relationships between response and auxiliary variables
are expected to exhibit complex spatial patterns in many applications. This paper
proposes a new approach for spatial regression, called spatially clustered regression,
to estimate possibly clustered spatial patterns of the relationships. We combine K-
means-based clustering formulation and penalty function motivated from a spatial
process known as Potts model for encouraging similar clustering in neighboring loca-
tions. We provide a simple iterative algorithm to ﬁt the proposed method, scalable
for large spatial datasets. Through simulation studies, the proposed method demon-
strates its superior performance to existing methods even under the true structure
does not admit spatial clustering. Finally, the proposed method is applied to crime
event data in Tokyo and produces interpretable results for spatial patterns. The R
code is available at https://github.com/sshonosuke/SCR.
Key words: Geographically weighted regression; K-means algorithm; Penalized like-
lihood; Potts model; Spatially varying parameters
1Corresponding author, Address: 5-1-5, Kashiwanoha, Kashiwa, Chiba 2778568, JAPAN, Email:
sugasawa@csis.u-tokyo.ac.jp
1
arXiv:2011.01493v2  [stat.ME]  28 Apr 2021

1
Introduction
Spatial heterogeneity, which is often referred to as the Second Law of Geography
(Goodchild, 2004), is ubiquitous in spatial science. Geographically weighted regres-
sion (GWR; Brunsdon et al., 1998; Fotheringham et al., 2002), which is a repre-
sentative approach for modeling spatial heterogeneity, has widely been adopted for
modeling possibly spatially varying regression coeﬃcients; its applications cover social
science (e.g. Hu et al., 2016), epidemiology (e.g. Nakaya et al., 2005) and environ-
mental science (e.g. Zhou et al., 2019).
Despite the success, GWR is known to be numerically unstable and may produce
extreme estimates of coeﬃcients (e.g. Wheeler and Tiefelsdorf, 2005; Cho et al., 2009).
To address the drawback, a wide variety of regularized GWR approaches have been
developed (e.g. Wheeler, 2007, 2009; B´arcena et al., 2014). More recently, Comber
et al. (2016) considered local regularization to enhance accuracy and stability. Still,
it is less clear how to regularize GWR to improve stability while maintaining its
computational eﬃciency. Bayesian spatially varying coeﬃcient model (Gelfand et al.,
2003; Finley, 2011) is another popular approach for modeling spatial heterogeneity
in regression coeﬃcients. While Wheeler and Waller (2009) and Wolf et al. (2018)
among others have suggested its stability and estimation accuracy, this approach can
be computationally very intensive for large samples, limiting applications of spatial
regression techniques to modern large spatial datasets.
Therefore, an alternative
method that has stable estimation performance, as well as computational eﬃciency
under large datasets, is strongly required.
This paper proposes a new eﬀective approach for spatial regression with possibly
spatially varying coeﬃcients or non-stationarity. Our fundamental idea is a combina-
tion of regression modeling and clustering; we assume all the geographical locations
can be divided into a ﬁnite number of groups, where locations in the same groups
share the same regression coeﬃcients. Hence, possibly smoothed surfaces of varying
regression coeﬃcients are approximated by step functions. Owing to the clustering
technique, the estimation results would be numerically stable and more accessible
2

to interpret than GWR. The idea to incorporate spatial clustering into regression
is not new. There have been some two-stage procedures (e.g. Anselin, 1990; Bill´e
et al., 2017; Lee et al., 2017; Nicholson et al., 2019), but they tend to be ad-hoc
combinations of clustering and regression. In contrast, the proposed method carries
out regression and clustering simultaneously, which can produce reasonable spatial
clustering depending on regression structures.
To introduce such a spatial clustering nature, we employ indicators showing the
group to which the corresponding location belongs, and we estimate the grouping
parameters and group-wise regression models simultaneously. For estimating group
memberships, it would be reasonable to impose that the geographically neighboring
locations are likely to belong to the same groups. To this end, we introduce a penalty
function to encourage such spatially clustered structures motivated from the hidden
Potts model (Potts, 1952) that was originally developed for modeling spatially cor-
related integers. We will demonstrate that the proposed objective function can be
easily optimized by a simple iterative algorithm similar to K-means clustering. In
particular, updating steps in each iteration do not require computationally intensive
manipulations, so that the proposed algorithm is much more scalable than GWR.
For selecting the number of groups G, we employ an information criterion. Moreover,
the proposed approach allows substantial extensions to include variable selection or
semiparametric additive modeling, which cannot be achieved by existing techniques
such as GWR.
Recently, sophisticated statistical methods combining regression modeling and
clustering have been studied in the literature. In the context of spatial regression,
Li and Sang (2019) and Zhao and Bondell (2020) adopted a fused lasso approach to
shrink regression coeﬃcients in neighboring areas toward 0, which results in spatially
clustered regression coeﬃcients. However, the computation cost under large datasets
is substantial, and the performance is not necessarily reasonable, possibly because
the method does not take account of spatially heterogeneous variances, which will
be demonstrated in our numerical studies.
On the other hand, in the context of
panel data analysis, clustering approaches using grouping indicators like the proposed
3

method have been widely studied (e.g. Bonhomme and Manresa, 2015; Wang et al.,
2018; Ito and Sugasawa, 2020).
Still, the existing works did not take account of
spatial similarities among the grouping indicators.
This paper is organized as follows. In Section 2, we introduce the proposed meth-
ods, estimation algorithms and discuss some related issues. In Section 3, we evaluate
the numerical performance of the proposed methods together with some existing
methods through simulation studies.
In Section 4, we demonstrate the proposed
method through spatial regression modeling of the number of crimes in the Tokyo
metropolitan area. Finally, we give some discussions in Section 5.
2
Spatially Clustered Regression
2.1
Models and estimation algorithm
Let yi be a response variable and xi is a vector of covariates in the ith location,
for i = 1, . . . , n, where n is the number of samples.
We suppose we are inter-
ested in the conditional distribution f(yi|xi; θi, ψ), where θi and ψ are vectors of
unknown parameters.
Here θi may change over diﬀerent locations and represent
spatial heterogeneity while ψ is assumed constant in all the areas.
For example,
f(yi|xi; θi, ψ) = φ(yi; xt
i1θi + xt
i2γ, σ2) with xi = (xi1, xi2) and ψ = (γ, σ2). We as-
sume that location information si (e.g. longitude and latitude) is also available for
the ith location. In what follows, we assume that there is no static parameter ψ for
simplicity, and all the results given above can be easily extended to the case.
Without any structures for θi, we cannot identify these parameters since a re-
peated measurement on the same location is rarely available in practice. Hence, we
assume that n locations are divided into G groups, and locations in the same group
share the same parameter values of θi. For a while, we treat G as a ﬁxed value, but a
data-dependent selection of G will be discussed later. We introduce gi ∈{1, . . . , G},
an unknown group membership variable for the ith location, and let θi = θgi. Then,
the distinct values of θi’s reduce to θ1, . . . , θG, where θ = (θt
1, . . . , θt
G)t is the set of un-
known parameters. Therefore, the unknown parameters in the model is the structural
4

parameter θ and the membership parameter g = (g1, . . . , gn)t.
Regarding the membership parameter, it would be reasonable to consider that
the membership in neighboring locations is likely to have the same memberships,
which means that the ﬁtted conditional distributions are likely to be the same in the
adjacent locations. To encourage such a structure, we introduce a penalty function
motivated by a spatial process for discrete space known as the Potts model (Potts,
1952). The same penalty function is ﬁrst adopted in Sugasawa (2020) in mixture
modeling. The joint probability function of the Potts model is given by
π(g1, . . . , gn|φ) ∝exp

φ
X
i<j
wijI(gi = gj)

,
where wij = w(si, sj) ∈[0, 1], w(·, ·) is a weighting function, and φ controls strength
of spatial correlation. Note that the normalizing constant in the above distribution
is not tractable. Still, we treat φ as a ﬁxed tuning parameter rather than an un-
known parameter so that we do not have to deal with the normalizing constant in
the following argument. Since the conditional distribution of gi given other variables
is the same form as one given above, the conditional distribution put more weights
on wijI(gi = gj) as φ is larger. Then, we propose the following penalized likelihood:
Q(θ, g) ≡
n
X
i=1
log f(yi|xi; θgi) + φ
X
i<j
wijI(gi = gj).
(1)
The above objective function can be regarded as the logarithm of the joint distribution
function of y1, . . . , yn, and g. We deﬁne the estimator of θ and g as the maximizer of
the objective function Q(θ, g).
For maximizing the objective function (1), we can employ a simple iterative al-
gorithm similar to K-means clustering, which iteratively updates the membership
variables g and the other parameters. Each updating step is straightforward since
the maximization of the objective function (1) given g is the same as maximizing
the log-likelihood function based on samples classiﬁed to each group. The detailed
algorithm is given as follows:
5

Algorithm 1.
(Spatially clustered regression)
1. Set initial values θ(0) and g(0).
2. Update the current parameter values θ(k) and g(k) as follows:
– Update the group-wise parameter θg separately for g = 1, . . . , G:
θ(k+1)
g
= arg max
θg
n
X
i=1
I(g(k)
i
= g) log f(yi|xi; θg).
– Update the membership variable:
g(k+1)
i
= arg max
g∈{1,...,G}

log f(yi|xi; θ(k+1)
g
) + φ
n
X
j=1;j̸=i
wijI(g = g(k)
j )

.
3. Repeat the step 2 until convergence.
Note that the updating step for θg is easy when f(yi|xi; θg) is a standard regression
model . For example, when f(yi|xi; θg) is a Gaussian linear regression model, the
updating process for θg are obtained in closed forms. On the other hand, for updating
gi, we just need to calculate values of the penalized likelihood function for all g ∈
{1, . . . , G}, separately for each i, which is not computationally intensive as long as
G is moderate. Therefore, each updating step is relatively easy to carry out and
computationally less intensive. The convergence in the algorithm is monitored by the
diﬀerence between the current values and updated values. The algorithm should be
terminated when the diﬀerence is smaller than the user-speciﬁed tolerance value ε,
where we used ε = 10−6 in our numerical studies.
2.2
Fuzzy clustered regression
Although Algorithm 1 produce interpretable results due to its clustering property, the
spatially clustered structure could be restrictive in terms of estimation accuracy when
region-wise constant functions cannot reasonably approximate the underlying struc-
ture. To overcome the diﬃculty, we consider the smoothed version of the proposed
6

method by incorporating fuzzy clustering, which allows the uncertainty of clustering
by introducing smoothed weight determined by the likelihood function. Speciﬁcally,
we consider the following synthetic probability that the ith location belongs to group
g given the other group membership variables:
πig =

f(yi|xi; θg) exp{φ Pn
j=1;j̸=i wijI(g = gj)}
δ
PG
g′=1

f(yi|xi; θg′) exp{φ Pn
j=1;j̸=i wijI(g′ = gj)}
δ ,
where δ controls the degree of fuzziness. As δ →∞, the maximum probability among
{πi1, . . . , πiG} converges to 1, resulting in the same hard clustering as in SCR. We
use δ = 1 as a default choice since πig can be seen as the conditional probability of
gi = g given the data. The iterative algorithm is given as follows.
Algorithm 2.
(Spatially fuzzy clustered regression)
1. Set initial values θ(0) and g(0).
2. Compute the following weights for i = 1, . . . , n and g = 1, . . . , G.
π(k)
ig =
h
f(yi|xi; θ(k)
g ) exp

φ Pn
j=1;j̸=i wijI(g = g(k)
j )
	iδ
PG
g′=1
h
f(yi|xi; θ(k)
g′ ) exp

φ Pn
j=1;j̸=i wijI(g′ = g(k)
j )
	iδ .
3. Update the current parameter values θ(k) and g(k) as follows:
– Update the group-wise parameter θg separately for g = 1, . . . , G:
θ(k+1)
g
= arg max
θg
n
X
i=1
π(k)
ig log f(yi|xi; θg).
– Update the membership variable: g(k+1)
i
= argmaxg∈{1,...,G}π(k)
ig .
4. Repeat the steps 2 and 3 until convergence.
Note that the updating step for θg corresponds to maximizing the weighted objec-
tive function, which is easy for typical regression models. The updating processes for
πig and gi can also be easily carried out without any computational diﬃculty. Based
7

on the outputs from Algorithm 2, we can compute the smoothed estimator of θi as
bθi = PG
g=1 bπikbθg. Although this smoothed estimator does not hold a clustering nature
due to area-wise mixing rates bπik, it can ﬂexibly adapt to local changes of underlying
spatially varying parameters.
2.3
Selection of tuning parameters
In the proposed method, we have two tuning parameters, G, the number of groups,
and φ controlling the strength of spatial dependence of gis. Since we found that the
speciﬁc choice of φ is not very sensitive as long as φ is strictly positive, thereby we
simply recommend setting φ = 1. Although the number of groups, G, could be de-
termined according to the prior information regarding the dataset, a data-dependent
method can be employed by using the following information criterion:
IC(G) = −2
n
X
i=1
log f(yi|xi; bθˆgi) + cndim(θ),
(2)
where cn is a constant depending on the sample size n and dim(θ) denotes the di-
mension of θ which depends on G. Speciﬁcally, we use cn = log n, which leads to a
BIC-type criterion. We select a suitable value of G as bG = argminG∈{G1,...,GL}IC(G),
where G1, . . . , GL are candidates of G.
2.4
Estimation in locations without samples
Suppose we want to estimate θr at some location r without samples. Since there is no
data point at location r, there is no likelihood based on the data, and the grouping
parameter gr under SCR can be simply estimated as
bgr = arg max
g∈{1,...,G}
n
X
i=1
wriI(g = bgi),
8

which results in bθr = bθbgr. In a similar way, estimation in location r without samples
under SFCR can be performed as bθr = PG
g=1 bπrgbθg, where
bπrg =

exp

φ Pn
i=1 wijI(g = bgi)
	δ
PG
g′=1

exp

φ Pn
i=1 wijI(g′ = bgi)
	δ .
2.5
Computation of standard errors
For evaluating the uncertainty of the ﬁnal estimator bθi, we here propose two ap-
proaches. The ﬁrst approach is a somewhat crude method that computes the stan-
dard errors of bθg based on the model f(y|x; θg) using samples with bgi = g, which is
easily performed as long as the model is tractable. However, this procedure ignores
the estimation error in bgi; thereby, the calculated standard errors may underestimate
the true ones. The second procedure is a computationally demanding but valid pro-
cedure using the parametric bootstrap. We ﬁrst generate bootstrap samples y∗
i from
the estimated model, f(·|xi; bθbgi), and apply SCR or SFCR to the bootstrap samples
to get the bootstrap estimators, bθ∗
g and bg∗
i . Since the bootstrap procedure requires
ﬁtting SCR or SFCR to each replication of the bootstrap samples, it can be computa-
tionally intensive under large spatial data. However, the parametric bootstrap would
be feasible in practice due to the eﬃcient and scalable optimization algorithm.
3
Simulation Studies
3.1
Simulation settings
We present simulation studies to illustrate the performance of the proposed spatially
clustered regression (SCR) and spatially fuzzy clustered regression (SFCR) methods
under two scenarios for underlying structures of regression coeﬃcients. In both sce-
narios, we uniformly generated n = 1000 spatial locations s1, . . . , sn in the domain
{s = (s1, s2) | s1 ∈[−1, 1], s2 ∈[0, 2], s2
1 + 0.5s2
2 > (0.5)2}.
Then, we gener-
ated two covariates from spatial processes, following Li and Sang (2019). Let z1(si)
and z2(si) be the two independent realizations of a spatial Gaussian process with
9

mean zero and a covariance matrix deﬁned from an isotropic exponential function:
Cov(zk(si), zk(sj)) = exp(−∥si−sj∥/η), k = 1, 2, where η is the range parameter. We
considered three cases of the parameter, η = 0.2, 0.6, 1, which are referred to as weak,
moderate, and strong spatial correlation. Then, we deﬁne two covariates x1(si) and
x2(si) via linear transformations x1(si) = z1(si) and x2(si) = rz1(si) +
√
1 −r2z2(si)
with r = 0.75, which allows dependence between x1(si) and x2(si). Then, the re-
sponse at each location is generated from the following model:
y(si) = β0(si) + β1(si)x1(si) + β2(si)x2(si) + σ(si)ε(si),
i = 1, . . . , n,
where ε(si)’s are mutually independent and ε(si) ∼N(0, 1). Regarding the settings
of the regression coeﬃcients and error variance, we considered the following two sce-
narios:
- (Scenario 1: Spatially clustered parameters)
The sampled domain is
divided into 6 regions Djk = {s | g1j < s1 ≤g1,j+1, g2k < s2 ≤g2,k+1} for
j = 0, 1 and k = 0, 1, 2, where g1j = −1 + j and g2k = 2k/3.
Regression
coeﬃcients and error variance for locations in Djk are set as follows:
β0(si) = 2(g1j + g2k),
β1(si) = g2
1j + g2
2k,
β2(si) = −g1j −g2k,
σ(si) = 0.5 + 0.2|g1j −g2k|,
thereby the regression coeﬃcients and error variance are constant within the
region Djk.
- (Scenario 2: Spatially smoothed parameters) Each regression coeﬃcient
was independently generated from a Gaussian spatial process. We set that all
the processes have a zero mean and isotropic exponential function given by
Cov(βk(si), βk(sj)) = τ 2 exp

−∥si −sj∥
ψk

,
k = 0, 1, 2,
10

where ψk is the range parameter and τ 2 is the variance parameter.
We ﬁx
τ 2 = 2 and ψk = k+1 in our study. Regarding the error variance, we set σ(si) =
0.2 exp(u(si)), where u(si) is a zero mean Gaussian spatial process with the same
isotropic exponential function Cov(uk(si), uk(sj)) = (0.5)2 exp(−∥si −sj∥/3).
3.2
Methods
For the simulated dataset, we applied the proposed SCR with φ = 1 and the number
of groups G selected among {5, 10, . . . , 30} by using the BIC-type criterion (2). We
also applied SFCR with δ = 1 and the same selected value of G in SCR. Regarding
the weight wij, we adopted two cases; ﬁve nearest neighbor (for the ith location we
set wij = 1 for the ﬁve nearest locations and wij = 0 otherwise) and exponential
weight function, namely, wij = exp(−∥si −sj∥2/(0.1)2), which are denoted by -n
and -e, respectively.
For competitors, we adopted two methods.
The ﬁrst one is
geographically weighted regression (GWR) as the most standard method in spatial
regression. Although spatially varying coeﬃcient models (Gelfand et al., 2003) are
also standard methods, previous studies suggest that spatially varying coeﬃcient
models tend to produce similar results to those of GWR (Finley, 2011) since they can
be regarded as a model-based version of GWR. Therefore, we only adopted GWR in
this study. The bandwidth parameter in GWR was chosen via cross-validation, and
all the estimation procedure was carried out via R package “spgwr” (Bivand and Yu,
2020), in which Gaussian kernel is used as the spatial weight function. We also applied
the multiscale GWR (Fotheringham et al., 2017), which estimates the bandwidth
parameter for each covariate, as an advanced version of GWR by using R package
“GWmodel” (Gollini et al., 2015). The second competitor is a more advanced and
recent regularization technique called spatial homogeneity pursuit (SHP) proposed in
Li and Sang (2019). In this method, we ﬁrst constructed a minimum spanning tree
connecting all the locations using R package “ape” (Paradis and Schliep, 2019), and
then lasso regularized estimation is applied using the R package ”glmnet” (Friedman
et al., 2010). Following Li and Sang (2019), the tuning parameter in the regularized
estimation was selected by the BIC-type criterion.
11

The estimation performance is evaluated based on the mean squared error (MSE)
given by
MSE = 1
np
n
X
i=1
p−1
X
k=0
n
bβk(si) −βk(si)
o2
,
where p = 3 and bβk(si) is the estimated value of βk(si). We also evaluate the per-
formance in terms of spatial interpolation (estimation in locations without samples)
by generating m = 100 additional locations and true regression coeﬃcients. Using
GWR, SCR, and SFCR, we obtained estimates in the locations without samples and
assessed the performance via the same MSE except for n replaced with m.
3.3
Results
We ﬁrst show the result using a single simulated dataset with η = 0.2. In Table 1, we
reported the computation time (second) of each method, where the program was run
on a PC with a 3 GHz 8-Core Intel Xeon E5 8 Core Processor with approximately
16GB RAM. It is observed that the proposed method is computationally comparable
with GWR, whereas SHP is computationally much more intensive than the other
methods. In the SHP method, we found that computation time for the minimum
spanning tree accounts for a large portion of the total computation time of SHP.
The spatial patterns of the true and estimated parameter values are presented in
Figures 1 and 2. In scenario 1, the proposed method can successfully capture the
underlying clustered structures of the regression coeﬃcients and detects the abrupt
changes across the boundaries of adjacent clusters. Note that the selected number of
groups was G = 10, which is the smallest choice among candidates that are larger than
the true number of clusters. On the other hand, GWR does not provide estimates
having clustered structures and produces poor estimates in some locations. Although
SHP can capture similar clustered structures, the proposed SCR can more precisely
capture the structure. In scenario 2, it is observed that GWR can precisely estimate
the spatially smoothed regression coeﬃcients while SCR can also produce reasonable
estimates by allowing a large number of groups. In fact, the largest number (G = 30)
among the candidates was selected in this scenario. Moreover, SFCR produces more
12

smoothed estimates than SCR, which are more similar to ones by GWR. In contrast,
the results of SHP are not necessarily satisfactory compared with the other methods.
We next report the estimation performance based on 1000 simulated datasets un-
der weak (η = 0.2), moderate (η = 0.6) and strong (η = 1) spatial correlation in
covariates. The boxplots of MSE are given in Figure 3. In scenario 1, SCR works
better than GWR regardless of the strength of spatial correlation due to the un-
derlying clustered structures of regression coeﬃcients. Although SHP takes account
of clustered structures, the performance is not necessarily preferable to the other
methods, possibly because SHP implicitly assumes that the error variance is spatially
homogeneous. It is also observed that the performance of GWR under moderate or
strong spatial correlation in covariates is not satisfactory. In scenario 2, the proposed
methods (SCR and SFCR) and GWR are quite comparable when the spatial correla-
tion in covariates is weak, whereas the proposed methods tend to perform better than
GWR as the spatial correlation increases. Although the performance of SHP is not
preferable under a weak spatial correlation, the relative performance gets improved
as the spatial correlation increases. Comparing SCR (hard clustering version) and
SFCR (fuzzy clustering version), SFCR can provide slightly better estimates than
SCR since the true spatial patterns are smooth.
Finally, we show the results of spatial interpolation (estimation in locations with-
out samples). The boxplots of the MSE values based on 200 replications are shown
in Figure 4. It is conﬁrmed that the proposed methods provide better interpolation
than GWR, especially when the correlation is moderate or strong in both scenarios.
It should be worth noting that the proposed methods perform better than GWR even
when the underlying spatial distribution of the regression coeﬃcient is smooth.
In summary, the proposed method can produce spatially varying estimates that
are more precise or as precise as those of the existing methods, while the computation
time is comparable with GWR and is much shorter than SHP. Hence, the proposed
method would be a preferable alternative for ﬂexible spatial regression under a large
spatial dataset.
13

Table 1: Computation time (seconds) of the four methods in one simulation.
GWR
SHP
SCR
SFCR
scenario 1
4.1
93.0
3.5
–
scenario 2
4.3
87.0
1.4
2.2
3.4
Computation time under large spatial data
Finally, we evaluated the scalability of the proposed method under large spatial
datasets. As benchmark methods, we adopted the standard GWR and the recently
proposed scalable version of GWR (Murakami et al., 2020), denoted by SGWR.
In this study, we set η = 0 (no spatial correlations in covariates), scenario 1 for
the regression coeﬃcients, and considered ﬁve cases of the sample size, namely,
n ∈{1000, 3000, 5000, 10000, 20000}. For each n, we generated 20 datasets and ap-
plied GWR, SGWR, SCR, and SFCR, where the tuning parameter was selected in the
same way as in Section 3.2. The averaged value of computation times over 20 replica-
tions and error bars representing double standard deviations are shown in Figure 5.
The results reveal that the computation time of GWR rapidly increases as n increases.
Although the computation time of the scalable version (SGWR) is relatively shorter
than the original GWR, especially under large n situations, the proposed SCR and
SFCR provide consistently shorter computation time than SGWR.
4
Application to crime risk modeling
Here we apply the proposed methods to a dataset of the number of police-recorded
crime in the Tokyo metropolitan area, provided by the University of Tsukuba and
publicly available online (“GIS database of the number of police-recorded crime at
O-aza, chome in Tokyo, 2009-2017”, available at https://commons.sk.tsukuba.
ac.jp/data_en). In this study, we focus on the number of violent crimes in n =
2, 855 local towns in the Tokyo metropolitan area in 2015. For auxiliary information
about each town, we adopted area (km2), entire population density (PD), day-time
population density (DPD), the density of foreign people (FD), percentage of single-
14

person households (SH), and average year of living (AYL). Let yi be the observed
count of violent crimes, si be a two-dimensional vector of longitude and litigate of the
center, ai be area (km2) and xi be the vector of standardized auxiliary information
in the ith local town. For estimating the structure of the number of crimes explained
by the covariates, we employed the following spatially clustered negative binomial
model:
yi ∼NB(ai exp(xt
iβgi), νgi),
i = 1, . . . , n,
(3)
where βgi is a vector of unknown regression coeﬃcients, νgi is an overdispersion param-
eter, and NB(µ, r) is the negative binomial distribution with mean µ and dispersion
r. Under the model (3), the expectation of yi/ai is exp(xt
iβgi), so the regression term
can be interpreted as the crime risk per unit km2.
We ﬁrst apply the proposed SCR and SFCR. We obtained the spatial contingency
matrix based on the geographical information by choosing the ﬁve nearest locations
for each location. Then, we set φ = 1 and selected the number of groups G from
G ∈{1, . . . , 15} based on the BIC-type criterion (2), and we obtained G = 7 as the
optimal choice. With the selected G, we apply SCR and SFCR with δ = 1. To check
the sensitivity of δ in SFCR, we tried three other choices of δ (δ = 0.5, 2 and 5), but
the estimation results did not change very much. For comparisons, we applied the
geographically weighted negative binomial regression (da Silva and Rodrigues, 2014),
denoted by GWNB. The Gaussian kernel is used for the weighting function, and the
bandwidth is selected via cross-validation.
In Figure 6, we reported the estimated spatially varying regression coeﬃcients
for all the covariates. It is observed that GWNB produces estimates that change
drastically over the space, and the change tends to be more drastic around the edge of
the space. In particular, the estimated coeﬃcients of PD or ALY are not very smooth;
thereby, the interpretation of the results is not straightforward. On the other hand,
the proposed SCR method provides reasonable spatial clustering results and estimates
of group-wise regression coeﬃcients, and the results are highly interpretable compared
with those of GWNB, while the overall spatial trend obtained from both methods is
15

relatively similar. Comparing SCR and SFCR, SFCR tends to provide slightly more
smoothed estimates than SCR, especially around the boundaries between clusters.
Focused on SCR and SFCR, their coeﬃcients on PD take large values in the south
and east areas while those on DPD in the northeast areas. These areas are residential
areas.
These results suggest high crime risk in densely populated districts (e.g.,
shopping districts) in these residential areas. On the other hand, the coeﬃcients on
ALY have large positive values in the central area. It is known that people tend to
commit crimes in an area where they lived for a long time because they are familiar
with that area (Bernasco and Kooistra, 2010). Based on the coeﬃcients on ALY,
such a tendency is strong in the central area.
We next investigate the performance of the models in terms of prediction. To
this end, we ﬁrst randomly eliminated m = 200 locations, which are kept as “test
data”. Using the remained training data, we estimated the regression coeﬃcients in
the omitted locations based on SCR, SFCR, and GWNB, where the same values of
the number of groups or bandwidth parameters were adopted. Then, we predicted yi
using the information of xi and ai, and the prediction accuracy was assessed via the
following two measures:
MAPE = 1
m
X
j∈D
|ˆyj −yj|
yj + 1 ,
RMSE =



1
m
X
j∈D
(ˆyj −yj)2


,1/2
where ˆyj is the predicted value, and D is the index set for the test data. The results
are shown in Table 2, which shows that the proposed methods tend to produce more
stable spatial prediction than GWNB.
Finally, we consider another design of weight, wij, in the proposed methods. In
addition to the spatial contingency matrix (denoted by w1ij), we construct another
contingency matrix (denoted by w2ij) by choosing the ﬁve nearest neighbors in terms
of distance of the covariate information xi. Then, we deﬁne wij = (w1ij + w2ij)/2,
which takes account of not only geographical closeness but also covariate similarities.
The proposed SCR and SFCR methods with the covariate-dependent weight design
are denoted by SCR-cd and SFCR-cd. The performance of SCR-cd and SFCR-cd
16

are investigated through the spatial prediction using MAPE and RMSE, where the
results are given in Table 2. The results show that the prediction performance can
be successfully improved by introducing the covariate-dependent design.
Table 2: Two performance measures, MAPE and RMSE, of the ﬁve methods.
GWNB
SCR
SFCR
SCR-cd
SFCR-cd
MAPE
0.959
0.830
0.870
0.781
0.639
RMSE
6.31
4.26
11.80
4.89
4.16
5
Concluding remarks
This paper proposes a new spatial regression technique, called spatially clustered re-
gression (SCR), accounting for spatial heterogeneity in model parameters by explicitly
introducing grouping parameters. By employing a penalty function motivated by the
Potts model, we formulated the penalized likelihood function easily maximized via
a simple iterative algorithm. We also developed a fuzzy version of the method that
can produce more spatially smoothed estimates and considered straightforward but
essential extensions of the main idea. Compared with the most standard technique,
GWR, we numerically conﬁrmed that SCR performs better than or as well as GWR
in terms of parameter estimation, and the computational cost of SCR is much smaller
than that of GWR under large spatial data.
We ﬁnally discuss two meaningful extensions of the proposed method. The ﬁrst
one is variable selection by incorporating regularization techniques into the objective
function (1), given by
n
X
i=1
log f(yi|xi; θgi) + φ
X
i<j
wijI(gi = gj) −λ
G
X
g=1
p
X
k=1
pen(θgk),
(4)
where λ is a tuning parameter and pen(·) is a penalty function, e.g. pen(x) = |x| for
Lasso regularization (Tibshirani, 1996). Under the formulation, the updating step for
17

θ in Algorithm 1 is changed as follows:
θ(k+1)
g
= arg max
θg
( n
X
i=1
I(g(k)
i
= g) log f(yi|xi; θg) −λ
p
X
k=1
pen(θgk)
)
.
The above objective function is the same as the penalized log-likelihood based only on
the samples classiﬁed in the gth group; thereby, existing eﬃcient computation algo-
rithms could be applied to update θg. It should be noted that the use of the objective
function (4) leads to diﬀerent selected variables in each group. Thus (4) does not nec-
essarily induce variable selection of the p variables in xi. It might be more beneﬁcial to
determine the variable which is not used in all the G models in practice. To this end,
we also suggest using a grouped penalty function Pp
k=1 pen(θ1k, . . . , θGk) instead of
the element-wise penalty adopted in (4), where pen(θ1k, . . . , θGk) is the simultaneous
penalty on G regression coeﬃcients of the kth variable. A standard choice would be
grouped lasso penalty (Yuan and Lin, 2006) given by pen(θ1k, . . . , θGk) =
qPG
g=1 θ2
gk.
Following Zhou et al. (2007), we can modify the information criterion (2) to se-
lect the tuning parameter λ, that is, we replace dim(θ) with the degrees of freedom
PG
g=1
Pp
k=1 I(bθgk ̸= 0) in (2). In this case, the information criterion is a function of
both G and λ.
The second extension is to handle semiparametric structures for the regression
part. Suppose the conditional distribution is expressed as f(yi|xi; Hgi, γgi), where
Hg = {hg1, . . . , hgp} is a collection of unknown p functions and γ is a dispersion param-
eter. For example, the linear additive model is expressed as yi ∼N(Pp
k=1 hgik(xik), σ2
gi),
so that E[yi|xi] = Pp
k=1 hgik(xik). In the model, the additive eﬀect of each covariate
can be diﬀerent among G groups, and the model can be seen as a semiparametric
version of the model discussed in Section 2. The estimation of the model can be done
via a slight modiﬁcation of Algorithms 1 and 2. The updating step for θg is replaced
with one for Hg, given by
H(k+1)
g
= arg max
Hg
n
X
i=1
I(g(k)
i
= g) log f(yi|xi; Hg, γ(k)
g ).
18

The above optimization step is nothing but ﬁtting the generalized additive models for
observations assigned to the gth group, so that standard techniques such as sequential
ﬁtting (e.g. Hastie and Tibshirani, 1986) can be adopted to obtain H(k+1)
g
.
The
dispersion parameter γg can be updated in the same manner. The two extensions,
as mentioned above, would be helpful in practice, but the detailed theoretical and
numerical investigation is left to future works.
Acknowledgements
This work was supported by the Japan Society for the Promotion of Science (KAK-
ENHI) Grant Numbers 18K12757 and 18H03628.
References
Anselin, L. (1990). Spatial dependence and spatial structural instability in applied
regression analysis. Journal of Regional science 30(2), 185–207.
B´arcena, M. J., P. Men´endez, M. B. Palacios, and F. Tusell (2014). Alleviating the
eﬀect of collinearity in geographically weighted regression. Journal of Geographical
Systems 16(4), 441–466.
Bernasco, W. and T. Kooistra (2010). Eﬀects of residential history on commercial
robbers’ crime location choices. European Journal of Criminology 7(4), 251–265.
Bill´e, A. G., R. Benedetti, and P. Postiglione (2017). A two-step approach to account
for unobserved spatial heterogeneity. Spatial Economic Analysis 12(4), 452–471.
Bivand, R. and D. Yu (2020). spgwr: Geographically Weighted Regression. R package
version 0.6-33.
Bonhomme, S. and E. Manresa (2015). Grouped pattens of heterogeneity in panel
data. Econometrica 83, 1147–1184.
19

Brunsdon, C., A. Fotheringham, and M. Charlton (1998). Geographically weighted
regression – modelling spatial non-stationarity.
Journal of the Royal Statistical
Society: Series D 47, 431–443.
Cho, S., D. M. Lambert, S. G. Kim, and S. Jung (2009). Extreme coeﬃcients in
geographically weighted regression and their eﬀects on mapping.
GIScience &
Remote Sensing 46(3), 273–288.
Comber, A., P. Harris, N. Quan, K. Chi, T. Hung, and H. Phe (2016). Local variation
in hedonic house price, hanoi: a spatial analysis of sqto theory. In International
Conference on GIScience: Short paper proceedings, Volume 1, pp. 54–59.
da Silva, A. R. and T. C. V. Rodrigues (2014). Geographically weighted negative bi-
nomial regression—incorporating overdispersion. Statistics and Computing 24(5),
769–783.
Finley, A. O. (2011). Comparing spatially-varying coeﬃcients models for analysis of
ecological data with non-stationary and anisotropic residual dependence. Methods
in Ecology and Evolution 2, 143–154.
Fotheringham, A., C. Brunsdon, and M. Charlton (2002). Geographically Weighted
Regression. Wiley, West Sussex.
Fotheringham, A. S., W. Yang, and W. Kang (2017).
Multiscale geographically
weighted regression (mgwr).
Annals of the American Association of Geogra-
phers 107(6), 1247–1265.
Friedman, J., T. Hastie, and R. Tibshirani (2010). Paths for generalized linear models
via coordinate descent. Journal of Statistical Software 33, 1–22.
Gelfand, A. E., H. Kim, C. F. Sirmans, and S. Banerjee (2003). Spatial modeling
with spatially varying coeﬃcient prosseses. Journal oof the American Statistical
Associatiion 98, 387–396.
20

Gollini, I., B. Lu, M. Charlton, C. Brunsdon, P. Harris, et al. (2015). Gwmodel: An r
package for exploring spatial heterogeneity using geographically weighted models.
Journal of Statistical Software 63(i17).
Goodchild, M. F. (2004). The validity and usefulness of laws in geographic information
science and geography. Annals of the Association of American Geographers 94(2),
300–303.
Hastie, T. and R. Tibshirani (1986). Generalized additive models. Statistical Sci-
ence 1, 297–310.
Hu, S., S. Yang, W. Li, C. Zhang, and F. Xu (2016). Spatially non-stationary rela-
tionships between urban residential land price and impact factors in wuhan city,
china. Applied Geography 68, 48–56.
Ito, T. and S. Sugasawa (2020).
Clustered GEE analysis for longitudinal data.
arXiv:2006.06180.
Lee, J., R. E. Gangnon, and J. Zhu (2017). Cluster detection of spatial regression
coeﬃcients. Statistics in medicine 36(7), 1118–1133.
Li, F. and H. Sang (2019). Spatial homogeneity pursuit of regression coeﬃcients for
large datasets,. Journal of the American Statistical Association 114, 1050–1062.
Murakami, D., N. Tsutsumida, T. Yoshida, T. Nakaya, and B. Lu (2020). Scalable
gwr: A linear-time algorithm for large-scale geographically weighted regression with
polynomial kernels. Annals of the American Association of Geographers, 1–22.
Nakaya, T., A. S. Fotheringham, C. Brunsdon, and M. Charlton (2005). Geograph-
ically weighted poisson regression for disease association mapping. Statistics in
medicine 24(17), 2695–2717.
Nicholson, D., O. A. Vanli, S. Jung, and E. E. Ozguven (2019). A spatial regression
and clustering method for developing place-speciﬁc social vulnerability indices using
census and social media data. International Journal of Disaster Risk Reduction 38,
101224.
21

Paradis, E. and K. Schliep (2019). ape 5.0: an environment for modern phylogenetics
and evolutionary analyses in R. Bioinformatics 35, 526–528.
Potts, R. B. (1952). Some generalized order-disorder transformations. Mathematical
Proceedings of the Cambridge Philosophical Society 48, 106–109.
Sugasawa, S. (2020). Grouped heterogeneous mixture modeling for clustered data.
Journal of the American Statistical Association, to appear.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of
the Royal Statistical Society: Series B 58, 267–288.
Wang, W., P. C. B. Phillips, and L. Su (2018). Homogeneity pursuit in panel data
models: Theory and application. Journal of Applied Economics 33, 797–815.
Wheeler, D. and M. Tiefelsdorf (2005). Multicollinearity and correlation among local
regression coeﬃcients in geographically weighted regression. Journal of Geograph-
ical Systems 7(2), 161–187.
Wheeler, D. C. (2007). Diagnostic tools and a remedial method for collinearity in
geographically weighted regression. Environment and Planning A 39(10), 2464–
2481.
Wheeler, D. C. (2009). Simultaneous coeﬃcient penalization and model selection in
geographically weighted regression: the geographically weighted lasso. Environment
and planning A 41(3), 722–742.
Wheeler, D. C. and L. A. Waller (2009). Comparing spatially varying coeﬃcient mod-
els: a case study examining violent crime rates and their relationships to alcohol
outlets and illegal drug arrests. Journal of Geographical Systems 11(1), 1–22.
Wolf, L. J., T. M. Oshan, and A. S. Fotheringham (2018). Single and multiscale
models of process spatial heterogeneity. Geographical Analysis 50(3), 223–246.
Yuan, M. and Y. Lin (2006).
Model selection and estimation in regression with
grouped variables. Journal of the Royal Statistical Society: Series B 68, 49–67.
22

Zhao, Y. and H. Bondell (2020). Solution paths for the generalized lasso with ap-
plications to spatially varying coeﬃcients regression. Computational Statistics &
Data Analysis 142, 106821.
Zhou, H., T. Hastie, and R. Tibshirani (2007). On the “degrees of freedom” of the
lasso. The Annals of Statistics 35, 2173–2192.
Zhou, Q., C. Wang, and S. Fang (2019).
Application of geographically weighted
regression (gwr) in the analysis of the cause of haze pollution in china. Atmospheric
Pollution Research 10(3), 835–846.
23

−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
True
Latitude
Longitude
−4
−2
0
2
4
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
GWR
Latitude
Longitude
−4
−2
0
2
4
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SHP
Latitude
Longitude
−4
−2
0
2
4
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SCR
Latitude
Longitude
−4
−2
0
2
4
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SFCR
Latitude
Longitude
−4
−2
0
2
4
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
True
Latitude
Longitude
−1
0.5
2
3.5
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
GWR
Latitude
Longitude
−1
0.5
2
3.5
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SHP
Latitude
Longitude
−1
0.5
2
3.5
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SCR
Latitude
Longitude
−1
0.5
2
3.5
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SFCR
Latitude
Longitude
−1
0.5
2
3.5
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
True
Latitude
Longitude
−3
−1.75
−0.5
0.75
2
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
GWR
Latitude
Longitude
−3
−1.75
−0.5
0.75
2
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SHP
Latitude
Longitude
−3
−1.75
−0.5
0.75
2
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SCR
Latitude
Longitude
−3
−1.75
−0.5
0.75
2
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SFCR
Latitude
Longitude
−3
−1.75
−0.5
0.75
2
Figure 1: Scenario 1: spatial patterns of true and estimated regression coeﬃcients
based on GWR, SHP and SCR in one simulation with the spatial range parameter
η = 0.2 for covariates. The left, center and right columns correspond to β0, β1 and
β2, respectively.
24

−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
True
Latitude
Longitude
−4
−2.25
−0.5
1.25
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
GWR
Latitude
Longitude
−4
−2.25
−0.5
1.25
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SHP
Latitude
Longitude
−4
−2.25
−0.5
1.25
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SCR
Latitude
Longitude
−4
−2.25
−0.5
1.25
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SFCR
Latitude
Longitude
−4
−2.25
−0.5
1.25
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
True
Latitude
Longitude
−2
−0.25
1.5
3.25
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
GWR
Latitude
Longitude
−2
−0.25
1.5
3.25
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SHP
Latitude
Longitude
−2
−0.25
1.5
3.25
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SCR
Latitude
Longitude
−2
−0.25
1.5
3.25
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SFCR
Latitude
Longitude
−2
−0.25
1.5
3.25
5
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
True
Latitude
Longitude
−5
−3
−1
1
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
GWR
Latitude
Longitude
−5
−3
−1
1
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SHP
Latitude
Longitude
−5
−3
−1
1
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SCR
Latitude
Longitude
−5
−3
−1
1
3
−1.0
−0.5
0.0
0.5
1.0
1.5
0.0
0.5
1.0
1.5
2.0
SFCR
Latitude
Longitude
−5
−3
−1
1
3
Figure 2: Scenario 2: spatial patterns of true and estimated regression coeﬃcients
based on GWR, SHP, SCR, SFCR in one simulation with the spatial range parameter
η = 0.2 for covariates. The left, center and right columns correspond to β0, β1 and
β2, respectively.
25

GWR
MGWR
SHP
SCR−n
SFCR−n
SCR−e
SFCR−e
0.05
0.10
0.20
0.50
1.00
2.00
Scenario 1 (weak correlation)
GWR
MGWR
SHP
SCR−n
SFCR−n
SCR−e
SFCR−e
0.2
0.5
1.0
2.0
5.0
Scenario 2 (weak correlation)
GWR
MGWR
SHP
SCR−n
SFCR−n
SCR−e
SFCR−e
0.05
0.10
0.20
0.50
1.00
2.00
Scenario 1 (moderate correlation)
GWR
MGWR
SHP
SCR−n
SFCR−n
SCR−e
SFCR−e
0.2
0.5
1.0
2.0
5.0
Scenario 2 (moderate correlation)
GWR
MGWR
SHP
SCR−n
SFCR−n
SCR−e
SFCR−e
0.05
0.10
0.20
0.50
1.00
2.00
Scenario 1 (strong correlation)
GWR
MGWR
SHP
SCR−n
SFCR−n
SCR−e
SFCR−e
0.2
0.5
1.0
2.0
5.0
Scenario 2 (strong correlation)
Figure 3: Boxplot of MSE for GWR, SHP, SCR and SFCR based on 1000 simulated
datasets.
GWR
SCR
SFCR
0.05
0.20
0.50
2.00
Scenario 1 (weak correlation)
GWR
SCR
SFCR
0.2
0.5
1.0
2.0
5.0
Scenario 2 (weak correlation)
GWR
SCR
SFCR
0.05
0.20
0.50
2.00
Scenario 1 (moderate correlation)
GWR
SCR
SFCR
0.2
0.5
1.0
2.0
5.0
Scenario 2 (moderate correlation)
GWR
SCR
SFCR
0.05
0.20
0.50
2.00
Scenario 1 (strong correlation)
GWR
SCR
SFCR
0.2
0.5
1.0
2.0
5.0
Scenario 2 (strong correlation)
Figure 4: Boxplot of MSE for GWR, SCR and SFCR in terms of spatial interpolation
based on 200 simulated datasets.
26

0
20
40
60
80
sample size
time (seconds)
1000
3000
5000
10000
20000
GWR
SGWR
SCR
SFCR
Figure 5: Computation time (seconds) of GWR, SGWR, SCR and SFCR under large
samples, which are averaged over 20 replications. The vertical lines represent double
standard deviations of the replicated computation times.
27

PD (GWR)
−0.09
−0.02
0.06
0.13
0.21
0.28
DPD (GWR)
0.12
0.53
0.93
1.33
1.73
2.14
FPD (GWR)
−0.13
0.01
0.16
0.3
0.45
0.59
SH (GWR)
0
0.12
0.25
0.38
0.5
0.63
ALY (GWR)
−0.04
0.02
0.07
0.13
0.19
0.25
PD (SCR)
−0.09
−0.02
0.06
0.13
0.21
0.28
DPD (SCR)
0.12
0.53
0.93
1.33
1.73
2.14
FPD (SCR)
−0.13
0.01
0.16
0.3
0.45
0.59
SH (SCR)
0
0.12
0.25
0.38
0.5
0.63
ALY (SCR)
−0.04
0.02
0.07
0.13
0.19
0.25
PD (SFCR)
−0.09
−0.02
0.06
0.13
0.21
0.28
DPD (SFCR)
0.12
0.53
0.93
1.33
1.73
2.14
FPD (SFCR)
−0.13
0.01
0.16
0.3
0.45
0.59
SH (SFCR)
0
0.12
0.25
0.38
0.5
0.63
ALY (SFCR)
−0.04
0.02
0.07
0.13
0.19
0.25
Figure 6: Estimates of spatially varying coeﬃcients for the ﬁve covariates, PD, DPD,
FD, SH, and AYL based on geographically weighted negative binomial regression
(GWNB), spatially clustered regression (SCR), and spatially fuzzy clustered regres-
sion (SFCR).
28
